智能论文笔记

Medical Image Captioning via Generative Pretrained Transformers

Alexander Selivanov , Oleg Y. Rogov , Daniil Chesakov , Artem Shelmanov , Irina Fedulova , Dmitry V. Dylov

分类：计算机视觉 | 人工智能

2022-09-28

自动临床标题生成问题被称为建议模型，将额叶X射线扫描与放射学记录中的结构化患者信息结合在一起。我们将两种语言模型结合在一起，即表演 - 泰尔和GPT-3，以生成全面和描述性的放射学记录。这些模型的建议组合产生了文本摘要，其中包含有关发现的病理，其位置以及将每个病理定位在原始X射线扫描中的每个病理的2D热图。提出的模型在两个医学数据集（Open-I，Mimic-CXR和通用MS-Coco）上进行了测试。用自然语言评估指标测量的结果证明了它们对胸部X射线图像字幕的有效适用性。

translated by 谷歌翻译

PyTorch Image Quality: Metrics for Image Quality Assessment

Sergey Kastryulin , Jamil Zakirov , Denis Prokopenko , Dmitry V. Dylov

分类：计算机视觉

2022-08-31

图像质量评估（IQA）指标被广泛用于定量估计一些形成，恢复，转换或增强算法后图像降解的程度。我们提出了Pytorch图像质量（PIQ），这是一个以可用性为中心的库，其中包含最受欢迎的现代IQA算法，并保证根据其原始命题正确实现并进行了彻底验证。在本文中，我们详细介绍了图书馆基础背后的原则，描述了使其可靠的评估策略，提供了展示性能时间权衡的基准，并强调了GPU加速的好处Pytorch后端。Pytorch图像质量是一个开源软件：https：//github.com/photosynthesis-team/piq/。

translated by 谷歌翻译

HTML版本

Feather-Light Fourier Domain Adaptation in Magnetic Resonance Imaging

Ivan Zakazov , Vladimir Shaposhnikov , Iaroslav Bespalov , Dmitry V. Dylov

分类：计算机视觉

2022-07-31

深度学习模型的概括性可能会受到火车分布（源域）和测试（目标域）集的分布差的严重影响，例如，当设置由不同的硬件生成时。由于这个领域的转移，某个模型可能在一个诊所的数据上表现良好，然后在部署在另一个诊所时失败。我们提出了一种非常透明的方法来执行测试时间域的适应性。这个想法是替换目标低频傅立叶空间组件，这些空间组件被认为可以反映图像的样式。为了最大程度地提高性能，我们实现了“最佳样式供体”选择技术，并使用许多源数据点来更改单个目标扫描外观（多源传输）。我们研究了域转移严重程度对方法性能的影响，并表明我们的无训练方法达到了复杂的深区适应模型的最新水平。我们的实验代码已发布。

translated by 谷歌翻译

Image Quality Assessment for Magnetic Resonance Imaging

Segrey Kastryulin , Jamil Zakirov , Nicola Pezzotti , Dmitry V. Dylov

分类：计算机视觉

2022-03-15

图像质量评估（IQA）算法旨在再现人类对图像质量的看法。图像增强，生成和恢复模型的日益普及促使开发了许多方法来评估其性能。但是，大多数IQA解决方案旨在预测通用域中的图像质量，并适用于特定区域，例如医学成像，保持可疑。此外，对于特定任务的这些IQA指标的选择通常涉及故意引起的扭曲，例如手动添加噪声或人工模糊。然而，随后选择的指标被用来判断现实生活中计算机视觉模型的输出。在这项工作中，我们渴望通过对迄今为止的磁共振成像（MRI）进行最广泛的IQA评估研究来填补这些空白（14,700个主观得分）。我们使用经过培训的神经网络模型的输出，以解决与MRI相关的问题，包括扫描加速度，运动校正和DENOSISING中的图像重建。我们的重点是反映放射科医生对重建图像的看法，评估了MRI扫描质量的最具诊断性影响的标准：信噪比，对比度与噪声比率和人工制品的存在。七位训练有素的放射科医生评估了这些扭曲的图像，其判决随后与35个不同的图像质量指标（考虑到全参考，无参考和基于分布的指标）相关。对于所有被认为是解剖学和目标任务的三个拟议质量标准，发现最高的表现者 - DIST，HAARPSI，VSI和FID-VGG16 - 在三个提出的质量标准中都是有效的。

translated by 谷歌翻译

Landmarks Augmentation with Manifold-Barycentric Oversampling

Iaroslav Bespalov , Nazar Buzun , Oleg Kachan , Dmitry V. Dylov

分类：计算机视觉

2021-04-02

生成的对抗网络（GANS）的培训需要大量数据，刺激新的增强方法的发展，以减轻挑战。通常，这些方法无法产生足够的新数据或展开原始歧管超出的数据集。在本文中，我们提出了一种新的增强方法，可确保通过最佳运输理论将新数据保证保持在原始数据歧管内的新数据。所提出的算法在最近的邻居图中找到了派系，并且在每个采样迭代中，随机绘制一个集团以计算随机均匀重量的wassersein重c中心。然后这些重心成为一个可以添加到数据集的新的自然元素。我们将这种方法应用于地标检测问题，并在未配对和半监督方案中增加可用注释。此外，该想法是关于医疗细分任务的心脏数据验证。我们的方法减少了过度装备，提高了原始数据结果超出了质量指标，并超出了具有流行现代增强方法的结果。

translated by 谷歌翻译

Machine learning-accelerated chemistry modeling of protoplanetary disks

Grigorii V. Smirnov-Pinchukov , Tamara Molyarova , Dmitry A. Semenov , Vitaly V. Akimkin , Sierk van Terwisga , Riccardo Francheschi , Thomas Henning

分类：机器学习

2022-09-27

目标。借助（子）毫米观测值的大量分子发射数据和詹姆斯·韦伯（James Webb）空间望远镜红外光谱，访问原磁盘的化学成分的快进模型至关重要。方法。我们使用了热化学建模代码来生成各种多样的原行星磁盘模型。我们训练了一个最初的邻居（KNN）回归剂，以立即预测其他磁盘模型的化学反应。结果。我们表明，由于所采用的原行业磁盘模型中局部物理条件之间的相关性，可以仅使用一小部分物理条件来准确地重现化学反应。我们讨论此方法的不确定性和局限性。结论。所提出的方法可用于对线排放数据的贝叶斯拟合，以从观测值中检索磁盘属性。我们提出了在其他磁盘化学模型集上再现相同方法的管道。

translated by 谷歌翻译

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Nan Du , Yanping Huang , Andrew M. Dai , Simon Tong , Dmitry Lepikhin , Yuanzhong Xu , Maxim Krikun , Yanqi Zhou , Adams Wei Yu , Orhan Firat

分类：自然语言处理

2021-12-13

具有更多数据，计算和参数的缩放语言模型在自然语言处理方面取得了重大进展。例如，由于缩放，GPT-3能够在内心学习任务上实现强烈结果。但是，培训这些大密度模型需要大量的计算资源。在本文中，我们提出并开发了名为Glam（通用语言模型）的语言模型系列，它使用稀疏激活的专家架构来规模模型容量，同时与致密变体相比，也产生显着更少的训练成本。最大的Glam具有1.2万亿参数，比GPT-3大约为7倍。它仅消耗了用于训练GPT-3的1/3的能量，并且需要一半的计算拖鞋进行推理，同时仍然在29个NLP任务中实现更好的整体零射击和一次性性能。

translated by 谷歌翻译

Semi-Structured Object Sequence Encoders

Rudra Murthy V , Riyaz Bhat , Chulaka Gunasekara , Hui Wan , Tejas Indulal Dhamecha , Danish Contractor , Marina Danilevsky

分类：计算机视觉 | 人工智能 | 自然语言处理

2023-01-03

In this paper we explore the task of modeling (semi) structured object sequences; in particular we focus our attention on the problem of developing a structure-aware input representation for such sequences. In such sequences, we assume that each structured object is represented by a set of key-value pairs which encode the attributes of the structured object. Given a universe of keys, a sequence of structured objects can then be viewed as an evolution of the values for each key, over time. We encode and construct a sequential representation using the values for a particular key (Temporal Value Modeling - TVM) and then self-attend over the set of key-conditioned value sequences to a create a representation of the structured object sequence (Key Aggregation - KA). We pre-train and fine-tune the two components independently and present an innovative training schedule that interleaves the training of both modules with shared attention heads. We find that this iterative two part-training results in better performance than a unified network with hierarchical encoding as well as over, other methods that use a {\em record-view} representation of the sequence \cite{de2021transformers4rec} or a simple {\em flattened} representation of the sequence. We conduct experiments using real-world data to demonstrate the advantage of interleaving TVM-KA on multiple tasks and detailed ablation studies motivating our modeling choices. We find that our approach performs better than flattening sequence objects and also allows us to operate on significantly larger sequences than existing methods.

translated by 谷歌翻译

Spectral Bandwidth Recovery of Optical Coherence Tomography Images using Deep Learning

Timothy T. Yu , Da Ma , Jayden Cole , Myeong Jin Ju , Mirza F. Beg , Marinko V. Sarunic

分类：人工智能 | 计算机视觉

2023-01-02

Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subsampled OCT data and more recently, deep-learning-based methods have been explored. In this study, we simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction. In anticipation of the reduced resolution that accompanies wide-field OCT systems, we build upon super-resolution techniques to explore methods to better aid clinicians in their decision-making to improve patient outcomes, by reconstructing lost features using a pixel-to-pixel approach with an altered super-resolution generative adversarial network (SRGAN) architecture.

translated by 谷歌翻译

HPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D Images

Dmitry Yudin , Yaroslav Solomentsev , Ruslan Musaev , Aleksei Staroverov , Aleksandr I. Panov

分类：计算机视觉 | 人工智能

2022-12-30

We present a novel dataset named as HPointLoc, specially designed for exploring capabilities of visual place recognition in indoor environment and loop detection in simultaneous localization and mapping. The loop detection sub-task is especially relevant when a robot with an on-board RGB-D camera can drive past the same place (``Point") at different angles. The dataset is based on the popular Habitat simulator, in which it is possible to generate photorealistic indoor scenes using both own sensor data and open datasets, such as Matterport3D. To study the main stages of solving the place recognition problem on the HPointLoc dataset, we proposed a new modular approach named as PNTR. It first performs an image retrieval with the Patch-NetVLAD method, then extracts keypoints and matches them using R2D2, LoFTR or SuperPoint with SuperGlue, and finally performs a camera pose optimization step with TEASER++. Such a solution to the place recognition problem has not been previously studied in existing publications. The PNTR approach has shown the best quality metrics on the HPointLoc dataset and has a high potential for real use in localization systems for unmanned vehicles. The proposed dataset and framework are publicly available: https://github.com/metra4ok/HPointLoc.

translated by 谷歌翻译